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A stitch in time saves nine: 
a case of multiple OS vulnerability 


W rc = 












HI 


ft STW 


ri I ^ 


Agenda 



• CERT VU#649219 overview 



• Crash course on ring transitions on x86_64 

• Exploit techniques 

• Related musings 
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PART 1. CERT VU#649219 overview 







CERT VU#649219 




SYSRET 64-bit operating system privilege 
escalation vulnerability on Intel CPU hardware 

- Escalation from untrusted user to kernel 


Root cause: On Intel CPUs, "sysret" instruction 
executed with non-canonical return address 
throws exception in ringO 

Patches released on 12 June 2012 
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Known affected systems (in April 2012) 
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• Only 64bit OS versions running on Intel CPU 
are vulnerable 

• Xen with PV guests 

• Windows 7 and Windows 2008 R2 

• FreeBSD 

• NetBSD 
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Coordinating patches release 
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Xen security team 

Other affected software vendors 

Intel 

US CERT 

Bromium 

Thank you all 
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Known non-affected systems 
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Apple OSX 
OpenBSD >=5.0 

— Most likely, accidentally (?) fixed on 4 Jul 2011 
during code cleanup 

Linux kernel >= 2.6.15.5 

- Consciously fixed the root cause in 2006 

So, why so many systems were vulnerable 
after 2006? 
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CVE-2006-0774: before 2.6.16.5 

does not properly handle uncanonical return 
addresses on Intel EM64T CPUs, which reports 
an instead of the next 

instruction, which causes the kernel 

with the 

wrong GS. 

Impact not specified explicitly; Linux-specific 
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BID 17541: Intel EM64T SYSRET 

Vulnerability 

CVE-2006-0774 and BID 17541 suggest this is 
Linux-specific problem, DoS only 

— We will see it is not 

Apparently, other vendors did not notice the 
problem, and were not warned 

Hopefully, this talk is an explicit warning 

o 

black Prof 





On server systems, with untrusted users, 
obviously yes 


On desktop systems, they allow to escape 
from many sandboxing solutions 

- Really, no need to chain 10 different bugs 

Multiple OS issues are very rare, so this case is 
interesting 


O 

black rat 

U5A 2012 



PART 2: Crash 
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More dn stack switch mechanispri 
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Always used when changing ring 

Usually, not used when not changing ring 

Interrupt Stack Table feature allows to force 
stack switch even when exception happens in 
ringO 

- Normally used only for catastrophic exceptions 
like #MC, #DF and NMI 
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• Do the actual job 

• Restore usermode RSP 
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Exception in syscall handler... 
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... when RSP is still usermode-provided, is 
dangerous 

RSP assignments should not fault 
Can "sysret" throw an exception? 
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AMD 64 Technology 


24594—Rev. 3.18—March 2012 


rFLAGS Affected 


Exceptions 


ID 

VIP 

VIF 

AC 

VM 

RF 

NT 

IOPL 

OF 

DF 

IF 

TF 

SF 

ZF 

AF 

PF 

CF 

M 

M 

M 
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M 

M 

M 

M 
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M 

M 

M 

M 

M 

21 

20 

19 

18 

17 

16 

14 

13:12 

11 

10 

9 

8 

7 

6 

4 

2 

0 

Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags 
are blank. Undefined flags are U. 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SYSCALL and SYSRET instructions are not 
supported, as indicated by EDX bit 11 returned by 
CPUID function 8000_00u lh. 

X 

X 

X 

The system call extension bit (SCE) of the extended 
feature enable register (EFER) is set to 0. (The 

EFER register is MSR C000_0080h.) 

General protection, #GP 

X 

X 


This instruction is only recognized in protected 
mode. 



X 

CPL was not 0. 



Instruction Reference 

SYSRET 

395 
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64-Bit Mode Exceptions 

#UD If IA32_EFER.SCE bit = 0. 

If the LOCK prefix is used. 

#GP(0) IfCPL^O. 

If ECX contains a non-canonical address. 


SYSRET—Return From Fast System Call 
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Ring3 to ringO escalation 
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Part 3: Exploit techniques 
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What fe a non-canonical address? 




canonical 


1<<47 


(l<<64)-( 1<<47) 


noncanonical 


(1<<64)-1 

=□ 


canonical 
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never seen? 
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1<<47 


(1<<64)-(1«47) 


(1<<64)-1 


syscall 


canonical 


noncanonical 


canonical 
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How to force non-canonical address? 
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1<<47 


(1<<64)-(1«47) 


(1<<64)-1 


syscall 


canonical 


noncanonical 


canonical 
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Which OS allows required mapping 

~ 1 1 f t Mr 

1 w Pi^f T > y . g & ' k 

• I I J ~ j . 

Linux- 
Windows - 
OpenBSD, NetBSD- 
FreeBSD - yes 
Xen with PV guests- yes 
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... place "syscall" at (l«47)-2, set RSP to 
something unmapped, e.g. 0 
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Code execution in ringO? 




• One possible trick: before executing "syscall", 
set RSP to point to some important kernel 
data structure 


• When #GP is raised, processor will overwrite 
this structure with the exception record 

• Any subsequent stack pushes done by #GP 
handler will scribble over it, too 
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• Usually, the job of #GP handler invoked from 
unexpected location in ringO is to panic/ 
bugcheck the machine 


• Also, #GP handler may crash because of 
running in an unexpected environment 
- Particularly, usermode gs_base 


O 

black rat 

U5A 2012 






Registers 


usermode stack 
syscall 


syscall handler 
sysret 


#GP handler 


IDT 




Q 

black rat 

U5A 2012 


















































Tf I ' v. ^ 

FreeBSD exploit idemo 
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Exploit is reliable, and does not require 
knowledge of any kernel absolute addresses 

- IDT base leaks via "sidt" 

IDT overwrite is a very generic method 
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Usermode not allowed to map/access 
memory at (l«47)-2 

Any other way to force "sysret" with non- 
canonical address? 
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"sysret" not always returns after 


"sysca"" 


Jan Beulich of Suse has spotted it first, in Xen 
context 


Sys_sigreturn, NtContinue 

- Not working - they return via "iret" 

Let's search ntoskrnl.exe for "sysret" 
occurences 



USA 2012 




o 

black rat 

U5A 2012 











Just as before, we can trigger #GP handler 
running with arbitrary RSP 

Unfortunately, in this case, the trick of 
pointing RSP to IDT is problematic 

- Windows 7 #GP handler uses much more stack 
space, and before #PF is triggered, kernel memory 
before IDT is corrupted 

What if we point RSP to usermode area, and 
preload this memory with _really_ evil stuff? 
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#GP with usermode RSP, ant 
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• Stack is treated as uninitialized memory; code 
never reads from any location before write 

• We need to overwrite some stack location 
after it was initialized by #GP handler, but 
before it is used by it 

• We need to create a race condition - run 
another thread concurrently with exploit 
thread; so it works only on SMP system 
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#GP thread actions: 

- Call some_function 

- Some_function does its job 

- "ret" to some_function's return address 

"overwriter" thread actions 

- Continuously overwrite some_function's return 
address with the address of kernel shellcode 
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Is it reliable? 


"overwriter" must write in the short time 
window 


- It is only 1 assembly instruction in a loop 

"overwriter" must be scheduled to run when 
#GP handler runs 

- We can spawn many of them 

We have only one shot 

Exploit works 100% on bare metal; not in VM 
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Related research 




Derek Soeder, "VMware Emulation Flaw x64 
Guest Privilege Escalation", Nov 2008 


Nate Eldredge, CVE-2008-3890 


In this presentation, to fit in the timeslot, I 
ignored the issue of "swapgs" 
desynchronization; see the above for details 
- BTW, Xen does not use "swapgs" 
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Part 4: Related musings 
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3 answers, find the one that is a joke 
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Whose fault is *it? 


Those pesky security researchers! 

"Security people are leeches". 

"I can tell you I wish those people just would 
be quiet. It would be best for the world. That's 
not going to happen, so we have to work in 
the right fashion with these security 
researchers" 
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OS developers? 

"Sysret" semantics is explicitly documented 
Intel SDM 

after CVE-2006-0774, all developers should 
have checked if it is applicable to their OSes 
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Intel? 

allowing "sysret" to raise an exception, with 
user-controlled stack, is a design error 

After CVE-2006-0774, Intel should have 
realized the problem, notify everyone, and 
update SDM with an explicit warning 
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Typically, kernel escalation exploits attempt to 
run with ringO privilege some arbitrary code 
stored in usermode pages 

SMEP feature, present in Ivy Bridge - prevents 
this 


Properties very similar to NX/DEP... 
... with similar bypass techniques 
Still, a first step in the right direction 


O 

black rat 

U5A 2012 



The state of mainline kernels security 
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Large code base, implies many vulnerabilities 

Often reliably exploitable 

- A lot of state shared with untrusted usermode, 
e.g. virtual memory mappings 

Encapsulating untrusted code with hardware- 
assisted virtualization, if done correctly, seems 
better 

— What happens in V*, stays in V* 
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Send questions to rafal@bromium.com 

Updated versions of the paper (if any) at 

http://www.bromium.com/misc/ 

astitchintimesavesnine.pdf 
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Bonus track: Xen exploit demo 
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